Using the SAS® System and SAS® Enterprise MinerTM for Data Mining: A study of Cancer Survival at Mayo Clinic

نویسنده

  • Leonard Gordon
چکیده

This paper evaluates and predicts a certain epidemiological (cancer survival) condition using data-mining techniques in SAS®. A data set that contains information about the survival of lung-cancer patients from a study at the Mayo Clinic was extracted from the R survival package. Data-mining techniques—namely linear and logistic regression models, regression and classification trees, and nearest-neighbor analysis—are used for the analysis to see which method is best for determining cancer survival. Both a continuous response variable and a dichotomous response variable are selected and used to evaluate cancer survival of the patients. Linear regression, regression trees, and nearest-neighbor analysis are used to analyze the continuous response variable; logistic regression, classification trees, and nearest-neighbor analysis are used for the dichotomous response variable. The results show that the dichotomous response variable using the nearest-neighbor analysis (using 8 neighbors) and the classification trees are best for the analysis of cancer survival in this particular data set, with a correct classification rate of 78.6% for both methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

160-2011: Time Series Data Mining with SAS® Enterprise MinerTM

Traditionally, data mining and time series analysis have been seen as separate approaches to analyzing enterprise data. However, much of the data generated by business processes is time-stamped. Time series data mining is a marriage of forecasting and traditional data mining techniques that uses time dimensions and predictive analytics to to make better business decisions. SAS has developed a c...

متن کامل

Feature Extraction Methods for Time Series Data in SAS Enterprise MinerTM

Because time series data have a unique data structure, it is not easy to apply some existing data mining tools directly to the data. For example, in classification and clustering problems, each time point is often considered a variable and each time series is considered an observation. As the time dimension increases, the number of variables also increases, in proportion to the time dimension. ...

متن کامل

Using Enterprise MinerTM to Explore and Exploit Drug Discovery Data

One of the biggest challenges in modern pharmaceutical drug discovery is the effective management and exploitation of research data. How can one find those relatively small nuggets of knowledge buried in the great morass of data being generated by High Throughput Screening systems? One potential answer is by using Enterprise MinerTM software! This talk will describe how it can be used to find s...

متن کامل

Predicting Software Outcomes Using Data Mining and Text Mining

Organizations spend a major portion of their Information Technology budget on software maintenance. In this paper, we present a predictive model for the maintenance outcomes of the software projects. We also identify the factors that affect software maintenance outcomes. We build our model using Data Mining (DM) techniques on Open Source Software (OSS) project data. We use the public access to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010